Serialization - Encoding

Serialization

  • Represents complex data structures (like structs, maps, arrays) as a sequence of bytes for transmission or storage.

Binary

CBOR (Concise Binary Object Representation)
  • CBOR

  • RFC 8949

  • 2013

  • Why unpopular :

    • CBOR emerged in niche areas (IoT, security) outside mainstream web ecosystems.

    • Unlike JSON (native support), CBOR relies on external libraries in most languages.

    • Lacks backing from major platforms compared to Protobuf, JSON, or Avro.

    • Binary format is not human-readable, complicating debugging.

    • Few official serializers, almost no widely used CLI or visual tools.

    • Adds complexity not always needed.

  • RPC :

    • CBOR doesnโ€™t implement RPC directly but can be used as a payload format within RPC systems like JSON or Protobuf.

Custom Binary Format
  • Advantages :

    • Extremely Efficient: Can optimize exactly what you need to send.

    • Full Control: Avoid extra bytes, sending minimal data.

  • Disadvantages :

    • Implementation & Maintenance Complexity: Requires skill and time, harder to modify later.

Capโ€™n Proto
FlatBuffers
  • FlatBuffers

  • 2014, developed by Google

  • Advantages :

    • Direct Access: Can read data without deserialization, boosting performance in games needing immediate data.

    • Compact and Fast: Produces small, fast-to-read files.

  • Disadvantages :

    • Implementation Complexity: Setup and usage can be tricky for newcomers.

Protocol Buffers (Protobuf)
  • Protobuf

  • Protobuf GitHub

  • 2008 (v2) / 2016 (v3)

  • Internal since 2001, public 2008

  • Developed by Google

  • Example of using Protobuf

  • Advantages :

    • Compact: Generates much smaller data packets than JSON/XML, reducing bandwidth usage.

    • Flexible Structure: Allows defining data schemas for multiple programming languages.

    • Performance: Fast serialization/deserialization reduces client/server load.

  • Disadvantages :

    • Schema Dependency: Requires a .proto  schema file, adding a development step.

  • Bindings :

MessagePack
  • MessagePack

  • 2008

  • Advantages :

    • Compact and Fast: Nearly as efficient as Protobuf, often 50% smaller than JSON.

    • Familiar Structure: Similar to JSON, easy to implement.

  • Disadvantages :

    • Less Flexible than Protobuf: Lacks some space/performance optimizations.

Thrift
  • Thrift

  • Developed by Apache

  • Advantages :

    • Compact and Flexible: Smaller than JSON/XML.

    • RPC Support: Integrated remote procedure calls.

  • Disadvantages :

    • Higher Complexity: Requires more setup and learning.

Karmem
SQLite .db
  • SQLite is embedded; each game instance has its own .db  file, complicating synchronization between players.

Godot: Built-In Encoding with PacketPeer
  • Advantages : Integrated into Godot networking, supports common data types, simple and efficient for small/moderate network games.

  • Disadvantages : Less control and advanced optimization than Protobuf/FlatBuffers.

  • Recommendation: Good for small/medium games needing quick state synchronization without extreme optimization.

Godot: Custom Encoding
  • Creating a custom network encoding system

    • Highly optimized, but requires more work per data type.

    • Example: Sending a vector as 1โ€“8 instead of 20 bytes.

  •   - `var_to_bytes([type, node_path, method_stringname, array_params])`
  •   - Progressive encoding:       - `var_to_bytes(type)`       - `var_to_bytes(type) + var_to_bytes(node_path)`       - `var_to_bytes(type) + var_to_bytes(node_path) + var_to_bytes(method_stringname)`       - `var_to_bytes(type) + var_to_bytes(node_path) + var_to_bytes(method_stringname) + var_to_bytes(array_params)`

Non-Binary

  • Formats like JSON/XML are common but not ideal for games due to extra size and overhead bytes.

JSON
  • 2001

  • Standardized as ECMA-404 (2013), RFC 8259

Encoding

  • Maps characters (text) to byte sequences and vice versa. Used for text fields in serialization.

Naming Confusion
  • When someone says "custom package encoding" , they usually mean:

    • A framing protocol  (how message start/end is delimited).

    • A custom serialization/deserialization  strategy.

    • A binary or textual format for transmitting structures over the network.

  • Using "encoding" for package framing strategies is technically valid but potentially ambiguous.

  • In networking, itโ€™s better to use more specific terms.

  • The word "encoding" itself isnโ€™t wrong but should be interpreted in the technical context.

  • In Odin, JSON and CBOR are considered "encoding" .

Text

UTF-8
  • Unicode Transformation Format โ€“ 8-bit

  • Size :

    • ASCII characters (0โ€“127) use 1 byte

    • Non-ASCII characters use up to 4 bytes

    • For languages with many non-ASCII characters (e.g., Chinese, Japanese), it can take more space than UTF-16

  • Web standard (used by HTML, JSON, XML, etc.)

  • Backward compatible with ASCII; valid ASCII text is valid UTF-8

  • Serialization:

    • UTF-8 can be considered a form of serialization, specifically for binary text serialization

UTF-16
  • Size :

    • BMP characters (Basic Multilingual Plane, U+0000 to U+FFFF) use 2 bytes

    • Characters outside BMP (e.g., emojis, historical scripts) use 4 bytes (surrogate pairs)

    • More efficient for languages with many BMP characters (e.g., many Asian languages)

  • Widely used in some APIs and programming languages (e.g., Java, Windows, .NET)

UTF-32
  • Size : All characters are 4 bytes, making manipulation and indexing easier

ASCII
  • American Standard Code for Information Interchange

  • Legacy system compatibility : For old systems or devices that only support ASCII

  • Simple English text : When text contains only basic characters (Aโ€“Z letters, 0โ€“9 digits, basic punctuation)

  • Simplicity : ASCII uses exactly 1 byte (8 bits) per character, simplifying processing in very basic systems